Spectral Sparse Representation for Clustering: Evolved from PCA, K-means, Laplacian Eigenmap, and Ratio Cut

نویسندگان

  • Zhenfang Hu
  • Gang Pan
  • Yueming Wang
  • Zhaohui Wu
چکیده

Dimensionality reduction, cluster analysis, and sparse representation are among the cornerstones of machine learning. However, they seem unrelated to each other and are often applied independently in practice. In this paper, we discovered that the spectral graph theory underlies a series of these elementary methods and unifies them into a complete framework. The methods range from PCA, K-means, Laplacian eigenmap (LE), ratio cut (Rcut), and a new sparse representation method uncovered by us, called spectral sparse representation (SSR). Further, extended relations to conventional over-complete sparse representations, e.g., MOD, KSVD, manifold learning, e.g., kernel PCA, MDS, Isomap, LLE, and subspace clustering, e.g., SSC, LRR are incorporated. We will show that, under an ideal condition from the spectral graph theory, PCA, K-means, LE, and Rcut are unified together, and when the condition is relaxed, the unification evolves to SSR, which lies in the intermediate between PCA/LE and K-mean/Rcut, and combines merits of both sides: the sparse codes of it reduce dimensionality of data meanwhile revealing cluster structure. Plenty of properties and clear interpretations exhibit, and due to its inherent relation to cluster analysis, the codes of SSR can be directly used for clustering. The linear version of SSR is under-complete, complementing the conventional over-complete sparse representations. An efficient algorithm, NSCrt, is developed to solve the sparse codes of SSR. By virtue of its good performance, the application of SSR to clustering, called Scut, reaches state-of-the-art performance in spectral clustering family. Scut outperforms Kmeans based spectral clustering methods, which depend on initialization and easily get trapped in local minima. The one-shot solution obtained by Scut is comparable to the optimal result of K-means that are run many times. Experiments on data sets of diverse nature demonstrate the properties and strengths of SSR, NSCrt, and Scut.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Multi-body Motion Tracking Using Commute Time Clustering

The presence of noise renders the classical factorization method almost impractical for real-world multi-body motion tracking problems. The main problem stems from the effect of noise on the shape interaction matrix, which looses its block-diagonal structure and as a result the assignment of elements to objects becomes difficult. The aim in this paper is to overcome this problem using graph-spe...

متن کامل

Fusion of Thermal Infrared and Visible Images Based on Multi-scale Transform and Sparse Representation

Due to the differences between the visible and thermal infrared images, combination of these two types of images is essential for better understanding the characteristics of targets and the environment. Thermal infrared images have most importance to distinguish targets from the background based on the radiation differences, which work well in all-weather and day/night conditions also in land s...

متن کامل

Regularized l1-Graph for Data Clustering

l1-Graph has been proven to be effective in data clustering, which partitions the data space by using the sparse representation of the data as the similarity measure. However, the sparse representation is performed for each datum independently without taking into account the geometric structure of the data. Motivated by l1-Graph and manifold leaning, we propose Regularized l1-Graph (Rl1-Graph) ...

متن کامل

Sparse spectral clustering method based on the incomplete Cholesky decomposition

A new sparse spectral clustering method using linear algebra techniques is proposed. This method exploits the structure of the Laplacian to construct its approximation, not in terms of a low rank approximation but in terms of capturing the structure of the matrix. The approximation is based on the incomplete Cholesky decomposition with an adapted stopping criterion, it selects a sparse data set...

متن کامل

Spectral Clustering by Ellipsoid and Its Connection to Separable Nonnegative Matrix Factorization

This paper proposes a variant of the normalized cut algorithm for spectral clustering. Although the normalized cut algorithm applies the K-means algorithm to the eigenvectors of a normalized graph Laplacian for finding clusters, our algorithm instead uses a minimum volume enclosing ellipsoid for them. We show that the algorithm shares similarity with the ellipsoidal rounding algorithm for separ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1403.6290  شماره 

صفحات  -

تاریخ انتشار 2014